Note: Open scripts.Rproj first, then script. To easily use relative paths, click the down button next to knit and then click “Knit Directory –> Project Directory”. This should make loading and saving files much easier.

Alpha-diversity Analysis

Alpha-diversity represents diversity within an ecosystem or a sample, in other words, what is there and how much is there in term of species. However, it is not easy to define a species and we can calculate alpha-diversity at different taxonomic levels.

Several alpha-diversity indices can be calculated. Within the most commonly used: - Richness represents the number of species observed in each sample. - Chao1 estimates the total richness. - Pielou’s evenness provides information about the equity in species abundance in each sample, in other words are some species dominating others or do all species have quite the same abundances. - Shannon index provides information about both richness and evenness.

Alpha-diversity is calculated on the raw data. It is important to not use filtered data because many richness estimates are modeled on singletons and doubletons in the occurrence table. So, you need to leave them in the dataset if you want a meaningful estimate. Moreover, we usually not using normalized data because we want to assess the diversity on the raw data and we are not comparing samples to each other but only assessing diversity within each sample.

Based heavily on tutorial: https://scienceparkstudygroup.github.io/microbiome-lesson/04-alpha-diversity/index.html

Red text indicates regions that require the user to modify. Regardless, the user should check over all code blocks to ensure that everything is running correctly.

1. Load packages

Load packages.

library(tidyverse)
library(maditr)
library(vegan)
library(patchwork)

2. Set variables

Change file names and condition variables where appropriate.

# Input samples metadata
metadata.file <- "samples.tsv"
# Input coverage file
cov.file <- "coverage.tsv.gz"
# Variable in `meatdata.file` to use for plotting
condition <- "environment"
# Color of `condition` vairables, use this order for plotting
metadata.color <- c("CreekBiofilm"="#3b8f43", "Endolithic"="#8559a5", "Soil"="#ee7f22")

3. Load, clean, and pre-processing datasets

Adjust filtering to remove MAGs which you don’t want used for diversity analysis.

data_otu  <- read.table(cov.file, header = TRUE, sep = '\t') %>%
  select(c("Sample", "Genome", "TPM")) %>%
  filter(! Genome %in% c("unmapped", 
                         "Cmer_10D", "Cmer_10D_chloroplast", "Cmer_10D_mitochondria", 
                         "Gsulp_YNP5587_1_nuclear", "Gsulp_YNP5587_1_chloroplast", "Gsulp_YNP5587_1_mitochondria")) %>%
  dcast(Sample~Genome) %>%
  column_to_rownames("Sample")
## Using 'TPM' as value column. Use 'value.var' to override
data_otu <- as.matrix(data_otu)
data_otu <- round(data_otu)
mode(data_otu) <- "integer"
data_grp  <- read.table(metadata.file, header = TRUE, sep = '\t') %>%
  select(c("sample_id", condition)) %>%
  rename(variable=environment) %>%
  mutate(variable=factor(variable, levels=names(metadata.color))) %>%
  column_to_rownames("sample_id")
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(condition)
## 
##   # Now:
##   data %>% select(all_of(condition))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

4. Indices calculation

Alpha-diversity is calculated on the raw data (here data_otu).

data_richness <- estimateR(data_otu)                                            # calculate richness and Chao1 using vegan package

data_evenness <- diversity(data_otu) / log(specnumber(data_otu))                # calculate evenness index using vegan package

data_shannon <- diversity(data_otu, index = "shannon")                          # calculate Shannon index using vegan package

data_alphadiv <- cbind(data_grp, t(data_richness), data_shannon, data_evenness) # combine all indices in one data table

rm(data_richness, data_evenness, data_shannon)                                  # remove the unnecessary data/vector

5. Visualization

P1 <- ggplot(data_alphadiv, aes(x=variable, y=S.obs)) +
  geom_boxplot(fill=metadata.color) +
  labs(title= 'Richness', x= ' ', y= '', tag = "A") +
  geom_point()

P2 <- ggplot(data_alphadiv, aes(x=variable, y=S.chao1)) +
  geom_boxplot(fill=metadata.color) +
  labs(title= 'Chao1', x= ' ', y= '', tag = "B") +
  geom_point()

P3 <- ggplot(data_alphadiv, aes(x=variable, y=data_evenness)) +
  geom_boxplot(fill=metadata.color) +
  labs(title= 'Eveness', x= ' ', y= '', tag = "C") +
  geom_point()

P4 <- ggplot(data_alphadiv, aes(x=variable, y=data_shannon)) +
  geom_boxplot(fill=metadata.color) +
  labs(title= 'Shannon', x= ' ', y= '', tag = "D") +
  geom_point()

(P1 | P2) / (P3 | P4)

Plot the four alpha-diversity indices (S.obs, S.chao1, data_shannon, data_evenness) for both sites.

pairs(data_alphadiv[,c(2,3,7,8)])

cor(data_alphadiv[,c(2,3,7,8)])
##                   S.obs   S.chao1 data_shannon data_evenness
## S.obs         1.0000000 0.8465567    0.9112042     0.7628950
## S.chao1       0.8465567 1.0000000    0.7564513     0.6225012
## data_shannon  0.9112042 0.7564513    1.0000000     0.9611769
## data_evenness 0.7628950 0.6225012    0.9611769     1.0000000

6. Statistical analyses

You can use different statistical tests in order to test if there is any significant differences between treatments: parametric tests (t-test and ANOVA) or non-parametric tests (Mann-Whitney and Kruskal-Wallis). Before using parametric tests, you need to make sure that you can use them (e.g. normal distribution, homoscedasticity). We will use parametric tests for the following analysis.

We will first test the effect of the sampling site on the Shannon index using one-factor ANOVA test.

summary(aov(data_shannon ~ variable, data = data_alphadiv))
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## variable     2 1.2268  0.6134   13.43 0.00199 **
## Residuals    9 0.4109  0.0457                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can interpret the results as following: - There is a significant effect of the ‘variable’: Pr(>F) = 0.00199 (P-value < 0.01)

7. Session Info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.2.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
##  [1] patchwork_1.3.0 vegan_2.6-8     lattice_0.22-6  permute_0.9-7  
##  [5] maditr_0.8.4    lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1  
##  [9] dplyr_1.1.4     purrr_1.0.2     readr_2.1.5     tidyr_1.3.1    
## [13] tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    renv_1.0.11      
##  [5] stringi_1.8.4     hms_1.1.3         digest_0.6.37     magrittr_2.0.3   
##  [9] evaluate_1.0.1    grid_4.3.1        timechange_0.3.0  fastmap_1.2.0    
## [13] Matrix_1.6-5      jsonlite_1.8.9    mgcv_1.9-1        fansi_1.0.6      
## [17] scales_1.3.0      jquerylib_0.1.4   cli_3.6.3         crayon_1.5.3     
## [21] rlang_1.1.4       splines_4.3.1     munsell_0.5.1     withr_3.0.2      
## [25] cachem_1.1.0      yaml_2.3.10       tools_4.3.1       parallel_4.3.1   
## [29] tzdb_0.4.0        colorspace_2.1-1  vctrs_0.6.5       R6_2.5.1         
## [33] lifecycle_1.0.4   MASS_7.3-60.0.1   cluster_2.1.6     pkgconfig_2.0.3  
## [37] pillar_1.9.0      bslib_0.8.0       gtable_0.3.6      glue_1.8.0       
## [41] data.table_1.16.2 xfun_0.49         tidyselect_1.2.1  rstudioapi_0.17.1
## [45] knitr_1.49        farver_2.1.2      nlme_3.1-166      htmltools_0.5.8.1
## [49] labeling_0.4.3    rmarkdown_2.29    compiler_4.3.1